77
6
Extremely Fast Sequence Comparisons
Identify All the Molecules That Are Present
in the Cell
Abstract
With the BLAST server at NCBI (National Center of Biotechnology Information), you
can get an answer in seconds to a few minutes. This is made possible by fast, but not
entirely accurate, searches. Almost all of the fast bioinformatics programs on the net
use such heuristics. In BLAST, for example, two short but perfect match alignments are
first pretested in a database entry before an exact alignment with the database entry is
performed, thus saving a lot of computing time: indexing the database (after all, you
also look up this book via the table of contents much faster than by browsing through
it). Besides speed, sensitivity (do I recognize all relevant entries?) and specificity (do I
not get too many irrelevant entries?) are also important for a good heuristic search.
How and why do bioinformatic analyses actually work? A very basic step towards under
standing is to understand which biomolecule you have in front of you. For this purpose,
bioinformatics uses the analysis of the molecular sequence. It is important to remember
that we first need the experimentally determined sequence. However, this sequence does
not tell us which molecule is present. However, this can be solved by comparing the
respective molecular sequence with all entries in a database (cf. Chap. 1). The interesting
thing is that bioinformatics has developed very fast computational recipes (algorithms) for
this task. This was necessary because the sequences have grown so quickly that we are
now dealing with many millions of stored sequences and many billions of stored letters.
How do you speed up bioinformatics algorithms so that they can cope with these large
amounts of data?
© Springer-Verlag GmbH Germany, part of Springer Nature 2023
T. Dandekar, M. Kunz, Bioinformatics,
https://doi.org/10.1007/978-3-662-65036-3_6